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Abstract 

We introduce novel results for approximate inference on planar graphical models using the 
loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to 
express the exact partition function of a graphical model as a finite sum of terms that can 
be evaluated once the belief propagation (BP) solution is known. In general, full summation 
over all correction terms is intractable. We develop an algorithm for the approach presented 
in Chertkov et al. (2008) which represents an efhcient truncation scheme on planar graphs 
and a new representation of the series in terms of PfafRans of matrices. We analyze the 
performance of the algorithm for the partition function approximation for models with 
binary variables and pairwisc interactions on grids and other planar graphs. We study in 
detail both the loop series and the equivalent Pfaflian series and show that the first term 
of the Pfaflian series for the general, intractable planar model, can provide very accurate 
approximations. The algorithm outperforms previous truncation schemes of the loop series 
and is competitive with other state-of-the-art methods for approximate inference. 

Keywords: belief propagation, loop calculus, approximate inference, partition function, 
planar graphs. 



1. Introduction 

Graphical models are popular tools widely used in many areas which require modeling of 
uncertainty. They provide an effective approach through a compact representation of the 
joint probability distribution. The two most common types of graphical models are Bayesian 
Networks (BN) and Markov Random Fields (MRFs). 

The partition function of a graphical model, which plays the role of normalization con- 
stant in a MRF or probability of evidence (likelihood) in a BN is a fundamental quantity 
which arises in many contexts such as hypothesis testing or parameter estimation. Exact 
computation of this quantity is only feasible when the graph is not too complex, or equiv- 
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alently, when its tree-width is small. Currently many methods are devoted to approximate 
this quantity. 

The belief propagation (BP) algorithm (Pearl, 1988) is at the core of many of these 
approximate inference methods. Initially thought as an exact algorithm for tree graphs, it 
is widely used as an approximation method for loopy graphs (Murphy et al., 1999; Prey and 
MacKay, 1998). The exact partition function is explicitly related to the BP approximation 
through the loop calculus framework introduced by Chertkov and Chernyak (2006a). Loop 
calculus allows to express the exact partition function as a finite sum of terms (loop series) 
that can be evaluated once the BP solution is known. Each term maps uniquely to a sub- 
graph, also denoted as a generalized loop, where the connectivity of any node within the 
subgraph is at least degree two. Summation of the entire loop series is a hard combina- 
torial task since the number of generalized loops is typically exponential in the size of the 
graph. However, different approximations can be obtained by considering different subsets 
of generalized loops in the graph. 

It has been shown empirically (Gomez et al., 2007; Chertkov and Chernyak, 2006b) that 
truncating this series may provide efficient corrections to the initial BP approximation. 
More precisely, whenever BP performs satisfactorily which occurs in the case of sufficiently 
weak interactions between variables or short-range influence of loops, accounting for only a 
small number of terms is sufficient to recover the exact result (Gomez et al., 2007). On the 
other hand, for those cases where BP requires many iterations to converge, many terms of the 
series are required to improve substantially the approximation. A formal characterization 
of the classes of tractable problems via loop calculus still remains as an open question. 

A step toward this goal has been done in Chertkov et al. (2008) where it was shown 
that for any graphical model, summation of a certain subset of terms can be mapped to a 
summation of weighted perfect matchings on an extended graph. For planar graphs (graphs 
that can be embedded into a plane without crossing edges), summation of the subset can be 
performed in polynomial time evaluating the Pfaffian of a skew-symmetric matrix associated 
with the extended graph. Furthermore, the full loop series can be expressed as a sum over 
certain Pfaffian terms, where each Pfaffian term accounts for a large number of loops and 
is solvable in polynomial time as well. 

The approach of Chertkov et al. (2008) builds on classical results from 1960s by Kaste- 
leyn (1963); Fisher (1966) and other physicists who addressed the question of counting the 
number of perfect matchings on a planar grid, also known as the dimer problem in the 
statistical physics literature (a dimer correspond to a colored edge of the graph, and a valid 
dimer configuration consists of exactly one dimer per any edge of the graph). The key result 
of Kasteleyn (1963); Fisher (1966) can be summarized as follows: the partition function of 
a planar graphical model defined in terms of binary variables can be mapped to a weighted 
perfect matching problem and calculated in polynomial time under the restriction that in- 
teractions only depend on agreement or disagreement between the signs of their variables. 
Such a model is known in statistical physics as the Ising model without external field. Notice 
that exact inference for a general binary graphical model on a planar graph (that is Ising 
model with external field) is intractable (Barahona, 1982). 

Recently, some methods for inference over graphical models, based on the works of 
Kasteleyn and Fisher, have been introduced. Globerson and Jaakkola (2007) obtained 
upper bounds on the partition function for non-planar graphs with binary variables by 



2 



Approximate inference on planar graphs using Loop Calculus and BP 



decomposition of the partition function into a weighted sum over partition functions of 
spanning tractable (zero field) planar models. The resulting problem is a convex optimiza- 
tion problem and, since exact inference can be done in each planar su6-model, the bound 
can be calculated in polynomial time. 

Another example is the work of Schraudolph and Kamenetsky (2008) which provides 
a framework for exact inference on a restricted class of planar graphs using the approach 
of Kasteleyn and Fisher. More precisely, they showed that any joint probability function 
defined on binary variables can be expressed in a functional form without external fields by 
adding a new auxiliary node linked to all the existing nodes. Under this transformation, 
single-variable external fields can be allowed for a subset B of variables. If the graphical 
model is ;S— outerplanar, which means that there exists a planar embedding in which the 
subset B of the nodes lie on the same face, the techniques of Kasteleyn and Fisher can still 
be applied. 

Contrary to the two aforementioned approaches which rely on exact inference on a 
tractable planar model, the loop calculus directly leads to a framework for approximate 
inference on general planar graphs. Truncating the loop series according to Chertkov et al. 
(2008) already gives the exact result in the zero external field case. In the general pla- 
nar case, however, this truncation may result into an accurate approximation that can be 
incrementally corrected by considering subsequent terms in the series. 

In the next Section we review the main theoretical results of the loop calculus ap- 
proach for planar graphs and introduce the proposed algorithm. In Section 3 we provide 
experimental results on approximation of the partition function for regular grids and other 
types of planar graphs. We focus on a planar-intractable binary model with symmetric 
pairwise interactions but nonzero single variable potentials. The source code used to derive 
these results is freely available at http : //www . mbf ys . ru . nl/staf f /v . gomez/. We end this 
manuscript with conclusions and future work in Section 4. 

2. Belief Propagation and loop Series for Planar Graphs 

We consider the Forney graph representation, also called general vertex model (Forney, 
2001; Loeliger, 2004), of a probability distribution p{ct) defined over a vector a of binary 
variables (vectors are denoted using bold symbols). Forney graphs are associated with 
general graphical models which subsume other factor graphs, e.g. those correspondent to 
BNs and MRFs. In Appendix A we show how to convert a factor graph model to its 
equivalent Forney graph representation. 

A binary Forney graph Q := {V,£) consists of a set of nodes V where each node a G V 
represents an interaction and each edge (a, b) G £ represents a binary variable ab which take 
values aab '■= {±1}- We denote a the set of neighbors of node a. Interactions f^^ (cr are 
arbitrary functions defined over typically small subsets of variables where (Tq is the vector 
of variables associated with node a, i.e. Ca '■= (o"a6i, Cabj, . . . ) where bi G a. 

The joint probability distribution of such a model factorizes as: 






(1) 



aev 



where Z is the normalization factor, also called the partition function. 
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Prom a variational perspective, a fixed point of the BP algorithm represents a stationary 
point of the Bcthc "free energy" approximation under proper constraints (Yedidia et al., 
2000). In the Forney style notation: 

Z^^ = exp(-F^^), (2) 

F^'' = EE (^») 1^ (¥r^) -EE^^o (^«'') ^^bab M, 

„ \Ja\P^a) J I, „ 

where ba{(Ta) and hah{,CFab) are the beliefs (pseudo-marginals) associated to each node a G V 
and variable ah. For graphs without loops, Equation (2) coincides with the Gibbs "free 
energy" and therefore Z^^ coincides with the exact partition function Z . If the graph con- 
tains loops, Z^^ is just an approximation critically dependent on how strong the influence 
of the loops is. 

We introduce now some convenient definitions related to the loop calculus framework. 

Definition 1 A generalized loop in a graph Q = {V,£) is any subgraph C = {V',E'), 
V C V, -E" C iy' X V') n £ such that each node in V has degree two or larger. 

For simplicity, we will use the term "loop", instead of "generalized loop", in the rest of this 
manuscript. Loop calculus allows to represent Z explicitly in terms of the BP approximation 
via the loop series expansion: 



Z = Z''''-z, z=ll + Y,rc], rc=llfia;ac, (3) 

V cec J aeC 

where C is the set of all the loops within the graph. Each loop term rc is a product of 
terms lJ.a,ac associated with every node a of the loop, ac denotes the set of neighbors of a 
within the loop C: 

XI ^"(^^a) n {(^ab-rriab) 

fJ-aiac = — ''^/"^ > ^ab = E '^o.bKb {(^ab)- (4) 

T{\l'^-Kb 

beac 

In this work we consider planar graphs where all nodes are of degree not larger than three, 
that is I ac I < 3. We denote by triplet a node with degree three in the graph. In Appendix 
A we show that a graphical model can be converted to this representation at the cost of 
introducing auxiliary nodes. 

Definition 2 A 2-regular loop is a loop in which all nodes have degree exactly two. 

Definition 3 The 2-regular partition function Z^ is the truncated form of (3) which 
sums all 2-regular loops only: ^ 

Z^ = Z''P-z^, 20 = 1+ ''C- (5) 

CeCs.t.|oo|=2,VaeC 
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Figure 1: Example, (a) A Forney graph, (b) Corresponding extended graph, (c) Loops (in 
bold) included in the 2-regular partition function, (d) Loops (in bold and red) not 
included in the 2-regular partition function. Marked in red, the triplets associated 
with each loop. Grouped in gray squares, the loops considered in different subsets 
^' of triplets: (d.l) ^' = {c,h}, (d.2) ^ = {e,l}, (d.3) ^' = {h,l}, (d.4) ^ = {c, e} 
and (d.4) ^ = {c, e, h, 1} (see Section 2.2). 



As an example. Figure la shows a small Forney graph and Figure Ic shows seven loops 
found in the corresponding 2-regular partition function. 

2.1 Computing the 2-regular Partition Function Using Perfect Matching 

In Chertkov et al. (2008) it has been shown that computation of Z0 can be mapped to a 
dimer /matching problem, or equivalently, to the computation of the sum of all weighted 
perfect matchings within another graph. A perfect matching is a subset of edges such that 

1. Notice that this part of the series was called single- connected partition function in Chertkov et al. 
(2008). Here we prefer to call it 2-regular partition function because loops with more than one connected 
component are also included in this part of the series. 
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Figure 2: Fisher's rules. (Top) A node a of degree two in Q is split in two nodes in Gext- 
(Bottom) A node a of degree three in G is split in three nodes in Qext- The 
squares on the right indicate all possible matchings in Q^xt related with node a. 
Note that the rules preserve planarity. 



each node neighbors exactly one edge from the subset. The weight of a matching is the 
product of weights of edges in the matching. The key idea of this mapping is to extend the 
original Forney graph Q into an new graph Q^xt '■= O^Gext^^Gcxt) such a way that each 
perfect matching in Qext corresponds to a 2-regular loop in Q. (See Figures lb and c for an 
illustration). Under the condition of planarity, the sTim of all weighted perfect matchings 
can be calculated in a polynomial time following Kastelcyn's arguments. Here we reproduce 
these results with little variations and more emphasis on the algorithmic aspects. 

Given a Forney graph Q and the BP approximation, we simplify Q and obtain the 2-core 
by removing nodes of degree one recursively. After this step, Q is either the null graph (and 
then BP is exact) or it is only composed of vertices of degree two or three. 

To construct the extended graph Q^xt we split each node in Q according to the rules 
introduced by Fisher (1966) and illustrated in Figure 2. The procedure results in an ex- 
tended graph of |Vg^^j| < 3|V| nodes and \£g^^f \ < 3|f | edges. It is easy to verify that each 
2-regular loop in Q is associated with a perfect matching in Q^xt and, furthermore, this 
correspondence is unique. Consider, for instance, the vertex of degree three in the bottom 
of Figure 2. Given a 2-regular loop C, vertex a can appear in four different configurations: 
either node a does not appear in C, or C contains one of the following three paths: -b-a-c-, 
-b-a-d- or -c-a-d-. These four cases correspond to node terms in a loop with values 1, /Xa;{6,c}5 
Ha;{b,d} f^a:{c,d} respectively, and coincide with the matchings in Q^xt shown within the 
box on the bottom-right. An simpler argument applies to the vertex of degree two from the 
top portion of Figure 2. 
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Therefore, if we associate to each internal edge (new edge in Qg^t not in Q) of each split 
node a the corresponding term Ha-ac of Equation (4) and to the external edges (existing 
edges already in Q) weight 1, then the sum over all weighted perfect matchings defined on 
Qext is precisely ^g. The 2-regular partition function is obtained using Equation (5). 
Equivalently: 

20 = perfect matchings in Qext- 

Kasteleyn (1963) provided a method to compute this sum in polynomial time for planar 
graphs. We follow his approach. First, we create a planar embedding of Qext- A planar 
embedding of a graph divides the plane into disjoint regions that are bounded by sequences 
of edges in the graph. The regions are called faces. Second, we orient the edges of the 
planar embedding in such a way that for every face (except possibly the unbounded or 
external face) the number of clockwise oriented edges is odd. Algorithm 1 produces such 
an orientation (Karpinski and Rytter, 1998). It receives an undirected graph Qext and 
constructs a copy Q'e^^ := O^g'^^t: ^Q'^^J with properly oriented edges £g'^^^- 

It is convenient that Qext is bi-connected, i.e. it has no articulation points. If needed, we 
add dummy edges with zero weight which do not alter the partition function or the original 
model. 

Algorithm 1 Pfaffian orientation 

Arguments: undirected bi-conncctcd extended graph Qext- 
1: Construct a planar embedding Qext of Qext- 
2: Construct a spanning tree T of Qext- 

3: Construct a graph H having vertices corresponding to the faces of Qext- 

Connect two vertices in H if the respective face boundaries share an edge not in T. 
H is a tree. Root H to the external face. 

4: Q'ext ■■= T- 

5: Orient all edg es in Qext ci-rbitrarily. 

6: for all face (vertex in H) traversed in post-order do 

7: Add to Q'ext the unique edge not in Q'ext- 

8: Orient it such that the number of clock-wise oriented edges is odd. 

9: end for 

10: RETURN directed bi-connected extended graph Q'ext- 



Finally, denote /Xjj the weight of the edge between nodes i and j in Q'ext- We create the 
following skew-symmetric matrix A = —A^: 



Aij 



if (i, j) G £g>^ 
-Hij if (j, i) G £g>^ 
otherwise 



This matrix is known as the Tutte matrix of Q'^,^^^ and the Pfaffian of A gives the desired 

sum up to the overall sign. The Pfaffian oi A = ±^T)ct{A). However, Z0 can be either 
positive or negative, and computing the value of the Pfaffian with the sign yet uncertain 
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is not sufficient. Furthermore, since eacli element Aij can be negative not only due to the 
Pfaffian orientation but also if fiij is negative, the sign of the Pfaffian needs to be corrected. 
This problem is fixed with the help of the original Kasteleyn's binary matrix: 



Bij = < 



f+1 ii{i,j)eSg,^^ 
-1 ii{j,i)eSg,^ 
otherwise 



If the sign of Pf{B) is negative then the sign of Pf(^) is changed. Notice that the 
absolute value of Pf(B) coincides with the number of perfect matchings or the number 
of loops included in the sum if no additional edges have been added. The sign of Pf[B) 
represents the correction. Therefore, tlic corrected value of Z0 is: 

Z0 = sign(pf(i3)) -Pf^i 

Calculation of can therefore be performed in time 0{N^) where N is the number of 
nodes of Qext (Galbiati and Maffioli, 1994). For the special case of binary planar graphs with 
zero local fields the 2-regular partition function coincides with the exact partition function 
Z = Zii, = Z^^ ■ z$ since the other terms in the loop series vanish. 



2.2 Computing the Full Loop Series Using Perfect Matching 

Chertkov et al. (2008) established that 2:0 is just the first term of a finite sum involving 
Pfaffians. We briefly reproduce their results here and provide an algorithm for computing 
the full loop scries as a Pfaffian scries. 

Consider T defined as the set of all possible triplets (vertices with degree three in the 
original graph Q). For each possible subset ^' G 7", including an even number of triplets, 
there exists a unique correspondence between loops in Q including the triplets in and 
perfect matchings in another extended graph Gextij, constructed after removal of the triplets 
'I' in Q. Using this representation the full loop series can be represented as a Pfaffian series, 
where each term Zij is tractable and is a product of the respective Pfaffian and the 
terms associated with each triplet of ^': ^ 

Z = '^Zq, ZySf = Z^Yl Ma;a (6) 

z^ = sign (pi (-Bvi/)) • Pf (ivp) . 

The 2-regular partition function thus corresponds to ^ = 0. We refer to the remaining 
terms of the series as higher order Pfaffian terms. Notice that matrices Aq, and depend 
on the removed triplets and therefore each requires different matrices and different 
edge orientations. In addition, after removal of vertices in Q the resulting graph may be 
disconnected. As before, in these cases we add dummy edges to Qext with zero weight to 
make the graph bi-connected again. 

2. We omit the loop index in the triplet term Ha-d because nodes have at most degree three and therefore 
the set a always coincide in all loops which contain that triplet. 
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Figure Id shows loops corresponding to the higher order Pfaffian terms on our illustrative 
example. The first and second subsets of triplets (d.l and d.2) include summation over two 
loops whereas the remaining Pfaffian terms include uniquely one loop. 

Exhaustive enumeration of all the subsets of triplets leads to a 2'-^l time algorithm, which 
is prohibitive. However, many triplet combinations may lead to forbidden configurations. 
Experimentally, wc found that a principled way to look for higher order Pfaffian terms with 
large contribution is to search first for pairs of triplets, then groups of four, and so on. For 
large graphs, this also becomes intractable. Actually, the problem is very similar to the 
problem of selecting loop terms rc with largest contribution. The advantage of the Pfaffian 
representation, however, is that Z0 is always the Pfaffian term that accounts for the largest 
number of loop terms and is the most contributing term in the series. In this work we do 
not derive any heuristic for searching Pfaffian terms with larger contributions. Instead, in 
Section 3.1 we study the full Pfaffian series and subsequently we restrict ourselves on the 
accuracy of ^0 . 

Algorithm 2 describes the full procedure to compute all terms using the representation 
of expression (6). The main loop of the algorithm can be interrupted at any time, thus 
leading to a sequence of algorithms producing corrections incrementally. 



Algorithm 2 Pfaffian series 
Arguments: Forney graph Q 
1: z:=0. 

2: for all G T) do 

3: Build extended graph Qext^^, applying rules of Figure 2. 

4: Set Pfaffian orientation in Qext^ according to Algorithm 1 
5: Build matrices A and B. 

6: Compute Pfaffian with sign correction 2;^ according to Equation (3). 
7: z:=z + Zii Hae^P 
8: end for 

9: RETURN ■ z 



3. Experiments 

In this Section wc study numerically the proposed algorithm. To facilitate the evaluation 
and the comparison with other algorithms we focus on the binary Ising model, a particular 
case of the model (1) where factors only depend on the disagreement between two variables 
and take the form fa {<7ab,(^ac) = {Ja;{ab,ac}'^ab'^ac) ■ We consider also nonzero local 
potentials parametrized by fa {(^ah) = exp {Ja\{ab}(^ah) in sl\ variables so that the model 
becomes planar-intractable. 

We create different inference problems by choosing different interactions {J a;{ab,ac}} ^-^id 
local field parameters {Ja-{ab}}- To generate them we draw independent samples from a 
Normal distribution {Ja-{ab,ac}] ~ -^(0, /3/2) and {Ja-{ab}} ~ -^(0, where and (3 
determine how difficult the inference problem is. Generally, for = the planar problem is 
tractable. For > 0, small values of (3 result in weakly coupled variables (easy problems) 
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and large values of (5 in strongly coupled variables (hard problems). Larger values of 6 
result in easier inference problems. 

In the next Subsection we analyze the full Pfaffian series using a small example and 
compare it with the original representation based on the loop series. Next, we compare our 
algorithm with the following ones: ^ 

Truncated Loop-Series for BP (TLSBP) (Gomez et al., 2007), which accounts for a 
certain number of loops by performing depth-first-search on the factor graph and then 
merging the found loops iteratively. We adapted TSLBP as an any-time algorithm 
(anyTLSBP) such that the length of the loop is used as the only parameter instead of 
the two parameters S and M (see Gomez et al. (2007) for details). This is equivalent 
to setting M = and discard S. In this way, anyTLSBP does not compute all possible 
loops of a certain length (in particular, complex loops ^ are not included), but is more 
efficient than TLSBP. 

Cluster Variation Method (CVM-Loopk) A double-loop implementation of CVM (Hes- 
kes et al., 2003). This algorithm is a special case of generalized belief propagation 
(Yedidia et al., 2005) with convergence guarantees. We use as outer clusters all (max- 
imal) factors together with loops of four (k=4) or six (k=6) variables in the factor 
graph. 

Tree-Structured Expectation Propagation (TreeEP) (Minka and Qi, 2004). This 
method performs exact inference on a base tree of the graphical model and approxi- 
mates the other interactions. The method yields good results if the graphical model 
is very sparse. 

When possible, we also compare with the following two variational methods which provide 
upper bounds on the partition function: 

Tree Reweighting (TRW) (Wainwright et al., 2005) which decomposes the parametriza- 
tion of a probabilistic graphical model as a mixture of spanning trees of the model, 
and then uses the convexity of the partition function to get an upper bound. 

Planar graph decomposition (PDC) (Globerson and Jaakkola, 2007) which decom- 
poses the parametrization of a probabilistic graphical model as a mixture of tractable 
planar graphs (with zero local field). 

To evaluate the accuracy of the approximations we consider errors in Z and, when possible, 
computational cost as well. As shown in Gomez et al. (2007), errors in Z, obtained from 
a truncated form of the loop series, are very similar to errors in single variable marginal 
probabilities, which can be obtained by conditioning over the variables under interest. We 
only consider tractable instances for which Z can be computed via the junction tree al- 
gorithm (Lauritzen and Spiegelhalter, 1988) using 8GB of memory. When studying the 

3. We use the libDAI library (Mooij, 2008) for algorithms CVM-Loopk, TreeEP and TRW. 

4. A complex loop is defined as a loop which can not be expressed as the union of two or more circuits or 
simple loops. 
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scalability of the approaches, we Given an approximation Z' of Z, the error measure used 
in this manuscript is: 

, |logZ-logZ'| 

errorZ = . 

logZ 

As in Gomez et al. (2007), we use four different message updates for BP: fixed and ran- 
dom sequential updates, parallel (or synchronous) updates, and residual belief propagation 
(RBP), a method proposed by Elidan et al. (2006) which selects the next message to be up- 
dated which has maximum residual, a quantity defined as an upper bound on the distance 
of the current messages from the fixed point. We report non-convergence when none of 
the previous methods converged. Wc report convergence at iteration t when the maximum 
absolute value of the updates of all the messages from iteration i — 1 to i is smaller than a 
threshold 7? = 10" 

3.1 Full Pfaffian Series 

In the previous Section we have described two equivalent representations for the exact 
partition function in terms of the loop series and the Pfaffian series. Here we analyze 
numerically how these two representations differ using an example, shown in Figure 3 as 
a factor graph, for which all terms of both series can be computed. We analyze a single 
instance, parametrized using @ = 0.1 and different pairwise interactions /3 G {0.1,0.5, 1.5}. 




Figure 3: Planar bipartite factor graph used to compare the full Pfaffian series with the 
loop series. Circles and black squares denote variables and factors respectively. 



We use TLSBP to retrieve all loops, 8123 for this example, and Algorithm 2 to compute 
all Pfaffian terms. To compare the two approximations we sort all contributions, either 
loops or Pfaffians, by their absolute values in descending order, and then analyze how the 
errors are corrected as more terms are included in the approximation. We define partition 
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BP error 

• Zb error 




10° 10^ 10*10° 10^ 10* 10° 10' 10^ 



I (loop terms) p (pfaffian terms) p (pfaffian terms) 

Figure 4: Comparison between the full loop series and the full Pfaffian series. Eaeh row 
corresponds to a different value of the interaction strength (3. Left column shows 
the error, considering loop terms Z'^^^^^{1) in log-log scale. Shaded regions 
include all loop terms (not necessarily 2-regular loops) required to reach the 
same (or better) accuracy than the accuracy of the 2-regular partition function 
Zg. Middle column shows the error considering Pfaffian terms Z^^{p) also in 
log-log scale. The first Pfaffian term corresponds to Zg,, marked by a circle. Right 
column shows the values of the first 100 Pfaffian terms sorted in descending order 
in \Z^\ and excluding 20. 



functions for the truncated series in the following way: 

V i=l...l J \i=l...p ) 

Then Z'^^^^^{1) accounts for the I most contributing loops and Z^^ {p) accounts for the p 
most contributing Pfaffian terms. In all cases, the Pfaffian term with largest absolute value 
Z-i^i^ corresponds to 20. 

Figure 4 shows the error Z^^^^^ (first column) and Z^^ (second column) for both 
representations. For weak interactions {(5 = 0.1) BP converges fast and provides an accurate 
approximation with an error of order 10~^. Summation of less than 50 loop terms (top-left 
panel) is enough to obtain machine precision accuracy. Notice that the error is almost 
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reduced totally with the 20 correction (top-middle panel). In this scenario, higher order 
terms of the Pfaffian series are negligible (top-right panel). 

As we increase /3, the quality of the BP approximation decreases. The number of loop 
corrections required to correct the BP error then increases. In this example, for intermediate 
interactions {(3 = 0.5) the first Pfaffian term ^0 improves considerably, more than five orders 
of magnitude, on the BP error, whereas approximately 100 loop terms are required to achieve 
a similar correction (gray region of middle- left panel). 

For strong interactions (/? = 1.5) BP converges after many iterations and gives a poor 
approximation. In this scenario also a larger proportion of loop terms (bottom-left panel) 
is necessary to correct the BP result up to machine precision. Looking at the bottom-left 
panel we find that approximately 200 loop terms are required to achieve the same correction 
as the one obtained by z^. The Zf^ is quite accurate (bottom-middle panel). 

As the right panels show, higher order Pfaffian contributions change progressively from 
a flat sequence of small terms to an alternating sequence of positive and negative terms 
which grow in absolute value as /? increases. These oscillations are also present in the loop 
series expansion. 

In general, we conclude that the ^0 correction to the BP approximation can give a 
significant improvement even in hard problems for which BP converges after many iterations. 
Notice again that calculating Z0, the most contributing term in the Pfaffian series, does not 
require explicit search of loop or Pfaffian terms. 

3.2 Grids 

After analyzing the full Pfaffian scries on a small random example we now restrict our at- 
tention to the Zij) approximation using Ising grids (nearest neighbor connectivity). First, we 
compare that approximation with other inference methods for different types of interactions 
(attractive or mixed) and then study the scalability of the method in the size of the graphs. 

3.2.1 Attractive Interactions 

We first focus on binary models with interactions that tend to align the neighboring variables 
to the same value, Ja;{ab,ac} 

> 0. If local fields are also positive Ja;{ab} > 0, Va € V, Sudderth 
et al. (2008) showed that, under some additional condition, the BP approximation is a lower- 
bound of the exact partition function and all loops (and therefore Pfaffian terms too) have 
the same sign^. Although this is not formally proved for general models with attractive 
interactions regardless of the sign of the local fields, numerical results suggest that this 
property holds as well for this type of models. 

We generate grids with positive interactions and local fields, that is |{ Ja;6c}| ~ -^(0, /3/2) 
and |{Ja;6}| ~ AA(O,/30), and study the performance for various values of /3, as well as for 
strong = 1 and weak = 0.1 local fields. 

Figure 5 shows the average error over 50 instances reported by different methods. Using 
this setup, BP converged in all instances using sequential updates of the messages. The error 
curves of all methods show an initial growth and a subsequent decrease, a fact explained 
by the phase transition occurring in this model for = and /3 « 1 (Mooij and Kappen, 

5. The condition is that all single variable beliefs at the BP fixed point must satisfy niab = babi+i) — 
bab{-l) > 0,V(a,6) e£ 
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(a) Attractive interactions, strong local fields: 0=1 (b) Attractive interactions, weak local fields: = 0.1 

10° I ^ I ' ' I ' ' ' ~ 




Figure 5: 7x7 grid attractive interactions and positive local fields. Error averages over 50 
random instances in function of the difficulty of the problem, (a) Strong local 
fields, (b) Weak local fields. 



2005). As the difference between the two plots suggest, errors are larger as approaches 
zero. Notice, that Z^j = Z for the limit case of = 0. 

We observe that in all instances ^0 always improves over the BP approximation. Cor- 
rections are most significant for weak interactions (3 < 1 and strong local fields. For strong 
interactions (3 > \ and weak local fields the improvement is less significant. 

It appears that the Z0 approximation performs better than TreeEP in all cases except 
for very strong couplings, where they show very similar results. Interestingly, Z0 performs 
very similar to CVM-Loop4 which is known to be a very accurate approximation for this 
type of model, see Yedidia et al. (2000) for instance. We observe that in order to obtain 
better average results than Z0 using CVM, we need to select larger outer-clusters such as 
loops of length 6, which increases dramatically the computational cost. 

The methods which provide upper bounds on Z (PDC and TRW) report the largest 
average error. PDC performs slightly better than TRW, as was shown in Globerson and 
Jaakkola (2007) for the case of mixed interactions. We remark that the worse performance 
of PDC for stronger couplings and weak local fields might be attributed to implementation 
artifacts, since for /3 > 4 we have numerical precision errors. In general, both upper bounds 
are significantly less tight than the lower bounds provided by BP and Z^^. 
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Figure 6: 7x7 grid mixed interactions. Error averaged over 50 random instances as a func- 
tion of the problem difficulty for (a) strong local fields and (b) weak local fields. 
Bottom panels show percentage of cases when BP converges for (c) strong local 
fields and (d) weak local fields. 



3.2.2 Mixed Interactions 

We now analyze a more general Ising grid model where interactions and local fields can 
have mixed signs. In that case, Z^^ and Z0 are no longer lower bounds on Z and loop 
terms can be positive or negative. Figure 6 shows results using this setup. Top panels show 
average errors and bottom panels show percent of instances in which BP converged using 
at least one of the methods described above. 

For strong local fields (subplots a,c), we observe that Z^ improvements over BP results 
become less significant as /3 increases. It is important to note that Z^ always improves on 
the BP result, even when the couplings are very strong (/3 = 10) and BP fails to converge 
for a small percentage of instances. performs slightly better than CVM-Loop4 and 
substantially better than TreeEP for small and intermediate /? . All three methods show 
similar results for strong couplings /? > 2. As in the case of attractive interactions, the best 
results are attained using CVM-loop6. 

For the case of weak local fields (subplots b,d), BP fails to converge near the transition 
to the spin-glass phase. For /3 = 10, BP converges only in less than 25% of the instances. 
In the most difficult domain, /3 > 22, all methods under consideration give similar results 
(all comparable to BP). Moreover, it may happen that Zg degrades the Z^^ approxima- 
tion because loops of alternating signs have major infiuence in the series. This result was 



15 



Gomez, Kappen and Chertkov 



Ising grids: p = 1 = 0.01 




N N 

Figure 7: Results on regular grids: scaling with grid size for strong interactions /3 = 1 and 
very weak local fields @ = 0.01. BP converged in all cases, (a) Error medians 
over 50 instances, (b) Cpu time (log-scale). 



also reported in Gomez et al. (2007) when loop terms, instead of Pfaffian terms, where 
considered. 

3.2.3 Scaling with Graph Size 

We now study how the accuracy of the approximation changes as we increase the size 
of the grid. We generate random grids with mixed couplings for \/iV = {4, ...,18} and 
focus on a regime of very weak local fields = 0.01 and strong couplings /? = 1, a difficult 
configuration according to the previous results. We compare also with anyTLSBP, a 
variant of our previous algorithm for truncating the loop series. We run anyTLSBP by 
selecting loops shorter than a given length, and the length is increased progressively. To 
provide a fair comparison between both methods, we run anyTLSBP for the same amount 
of cpu time as the one required to obtain Z0. 

Figure 7a shows the errors of different methods. Since variability in the errors is larger 
than before, we take the median for comparison. In order of increasing accuracy we get 
BP, TreeEP, anyTLSBP, CVM-Loop6, CVM-Loop4 and Z0. We note that larger clusters 
in CVM does not necessarily result in better performance. 

Overall, we can see that results are roughly independent of the network size N in almost 
all methods that we compare. The error of anyTLSBP starts being the smallest but soon 
increases because the proportion of loops captured decreases very fast. For N > 64, 
anyTLSBP performs worse than CVM. The Zg correction, on the other hand, stays flat and 
we can conclude that it scales reasonably well. Interestingly, although Z^j and TLSBP use 
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Figure 8: Two examples of planar graphs used for comparison between methods. We fix 
the number of concentric polygons to 9 and change the degree d of the central 
node within the range [3, ...,25]. (left) Graph for d = 3. (right) Graph for 
d = 25. Here nodes represent variables and edges pairwise interactions. We also 
add external fields which depend on the state of each nodes (not drawn). 



different ways to truncate the loop series, both methods show similar scaling behavior for 
large graphs. 

Figure 7b shows the cpu time for all the tested approaches averaged over all cases. 

Concerning the approximate inference methods, in order of increasing cost, we have BP, 
TrceEP, CVM-Loop4, with anyTLSBP, and CVM-Loop6. Although the cpu time re- 
quired to compute scales with 0{Nq^^^), its curve shows the steepest growth. We discuss 
how to correct this caveat in Section 4. The cpu time of the junction tree method quickly 
increases with the tree-width of the graphs. For large enough iV, exact solution via the 
junction tree method is no longer feasible because of its memory requirements. In contrast, 
for all approximate inference methods, memory demands do not represent a limitation. 

3.3 Radial grid graphs 

In the previous subsection we analyzed the quality of the ^0 correction for graphs with a 
regular grid structure. Here, we carry over the analysis of the Z^j^ correction using planar 
graphs which consist of concentric polygons with a variable number of sides. Figure 8 
illustrates these spider-web like graphs. We generate them as factor graphs with pairwise 
interactions which we subsequently convert to an equivalent Forney graph. (See Appendix 
A for details). Again, local field potentials are parametrized using = 0.01 and interactions 
using /3 = 1. We analyze the error in Z as a function of the degree d of the central node. 

Figure 9a shows the median of errors in Z of 50 random instances. First, we see that 
the variability of all methods, in particular anyTLSBP, is larger than in the regular grid 
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5 10 15 20 25 5 10 15 20 25 

d d 

Figure 9: Results on spider-web like graphs: scaling with the degree of the central node 
for f3 = 1 and G = 0.01. BP converged in all cases, (a) Error medians over 50 
instances, (b) Cpu time (log-scale). 



scenario. Also, the improvement of CVM-Loop4 over BP is slightly less significant, possibly 
caused by the existence of the central node with a large degree. CVM-Loop6 does not 
converge for instances with d > 4 before 10^ seconds and is not included in the analysis. 
We can say that all approaches present results independent of the degree d. 

The approximation is the best method compared to the other tested approaches. 
The improvements of Zg on CVM-Loop4 (the second best method) can be of more than 
two orders of magnitude and more than three orders of magnitude compared to BP. 

Computational costs are shown in 9b. The best performance of comes at the cost 
of being the most expensive approximate inference approach for which we obtain results. 
Again, for larger graphs, exact solution via the junction tree is not feasible due to the large 
tree-width. 

4. Discussion 

We have presented an approximate algorithm based on the exact loop calculus framework 
for inference on planar graphical models defined in terms of binary variables. The pro- 
posed approach improves the estimate for the partition function provided by BP without 
an explicit search of loops. 

The algorithm is illustrated on the example of ordered and disordered Ising model on 
a planar graph. Performance of the method is analyzed in terms of its dependence on the 
system size. The complexity of the partition function computation is exponential in the 
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general case, unless the local fields are zero, when it becomes polynomial. We tested our 
algorithm on regular grids and planar graphs with different structures. Our experiments 
on regular grids show that significant improvements over BP are always obtained if single 
variable potentials (local magnetic fields) are sufficiently large. The quality of this correction 
degrades with decrease in the amplitude of external field, to become exact at zero external 
fields. This suggests that the difficulty of the inference task changes abruptly from very 
easy, with no local fields, to very hard, with small local fields, and then decays again as 
external fields become larger. 

The ^0 correction turns out to be competitive with other state of the art methods for 
approximate inference of the partition function. First of all. wc showed that Z^j is much 
more accurate than upper bounds based methods such as TRW or PDC. This illustrates 
that such methods come at the cost of less accurate approximations. We have also shown 
that for the case of grids with attractive interactions, the lower bound provided by ^0 is 
the most accurate. 

Secondly, we found that Zij) performs much better than treeEP for weak and intermediate 
couplings and shows competitive results for strong interactions. Concerning CVM, we 
showed that using larger outer clusters does not necessarily lead to better approximations. 
In general, the correction presented better results than CVM for our choice of regions. 

Finally, we have presented a comparison of ^0 with TLSBP, which is another algorithm 
for the BP-based loop scries using the loop length as truncation parameter. On the one 
hand, the calculation of Zij) involves a re-summation of many loop terms which in the case 
of TLSBP are summed individually. This consideration favors the Zgf approach. On the 
other hand, Z^ is restricted to the class of 2-regular loops whereas TLSBP also accounts 
for terms corresponding to more complex loop structures in which nodes can have degree 
larger than two. Overall, for planar graphs, we have shown evidence that the Z0 approach is 
better than TLSBP when the size of the graphs is not very small. We emphasize, however, 
that TLSBP can be applied to non-planar binary graphical models too. 

Currently, the shortcoming of the presented approach is in its relatively costly imple- 
mentation. However, since the bottleneck of the algorithm is the Pfaffian calculation and 
not the algorithm itself (used to obtain the extended graphs and the associated matrices), 
it is easy to devise more efficient methods than the one used here. Thus, one may substitute 
brute-force evaluation of the Pfaffians by a smarter one available for planar graphs. This 
reduces the cost from 0{N^) to 0(7V3/2) (Galluccio et al., 2000; Loh and Carlson, 2006). 
Besides, the Pfaffian of B is binary, see Eq. (6), making it possible to improve using a 
bit-matrix representation (Schraudolph and Kamenetsky, 2008). Alternatively one could 
think of a strategy which does not require the Pfaffian of B. All these technical issues are 
the focus of our continuing investigation. 

In this manuscript we have focused on inference problems defined on planar graphs with 
symmetric pairwise interactions and, to make the problems difficult, we have introduced 
local field potentials. Notice however, that the algorithm can also be used to solve models 
with more complex interactions, i.e. more than pairwise as in the case of the Ising model 
(see Chertkov et al., 2008, for a discussion of possible generalizations). This makes our 
approach more powerful than other approaches, namely, (Globerson and Jaakkola, 2007; 
Schraudolph and Kamenetsky, 2008), designed specifically for the pairwise interaction case. 
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Although planarity is a severe restriction, we emphasize that planar graphs appear in 
many contexts such as computer vision and image processing, magnetic and optical record- 
ing, or network routing and logistics. It would also be interesting (and possible) to consider 
extensions of the algorithm developed in the manuscript for approximate inference of some 
class of non-planar graphs. Thus, following the approach of Globerson and Jaakkola (2007), 
one can think of other types of spanning subgraphs more general than " easy" planar graphs 
for which exact computation can be performed using perfect matching. The correction 
can be an accurate approximation for this spanning subgraphs and the resulting approxi- 
mation method would also provide bounds on the exact result. 
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Appendix A: Converting a factor graph to a Forney Graph. 

A probabilistic model is usually represented as a Bayesian Network or a Markov Random 
Field. Since bipartite factor graphs subsume both models, we show here how to convert 
a factor graph model defined in terms of binary variables to a more general Forney graph 
representation, for which the presented algorithm can be directly applied to. 

On a bipartite factor graph Qjr = (Vjf, (Sjf) the set Vjf is composed of a set of variable 
nodes X and a set of factor nodes J . Each variable node i G T, i := {1,2,...} represents 
a variable which takes values (jj = {±1}. We label factor nodes using capital letters so 
that a = {A, B, . . .},a ^ J' denotes a factor node which has an associated function fa{ca) 
defined on a subset of variables a E I. An (undirected) edge exists between two nodes 
(a, i) G if i £ a. 

Given a direct way to obtain an equivalent Forney graph Q is: first, to create a 
node Si eV for each variable node i G V^, and second, to associate a new binary variable 
6ia with values a^.a = {±1} to edges {Si, a) G £. Nodes Si E V are equivalent factor nodes 
denoting the characteristic function: Si{cra) = 1 if crs-a = crSibi Va, b £ Si and zero otherwise. 
Finally, factor nodes c G correspond to the same factor nodes c in V but defined in 
terms of the new variables SiC, Vz G c. 

Figure 10 shows an example of this transformation. Notice that, although we impose 

an direction in the edge labels, they remain undirected: (Si, a) = (a, Si), V5i,a G V. For 
variables i G Vjf which only appear in two factors, such as variable 3, the corresponding ^3 
node is redundant and can be removed. The joint distribution of Qj^ is related to the joint 
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Figure 10: (a) An factor graph Qj^ and (b) an equivalent Forney graph Q. 

distribution of Q by: 

^/A(cri)/B(cr2)/c(o-i, cr2)/D((Ti, o-3)/£;(o-2, 0-3) (7) 

= -^fA{(^5iA)fB{(^S2B)fc{(^SiC, (^52c) fD{(^SiD, (^S,iD)fE{(^S2E, (^Sse) 

/<5i(o"<5i^, (^SiC, '7SiD)fs2{(^S2B,<y52C,<^52E)fs3{(^53D,<y53E)- 

Once Q has been generated following the previous procedure it may be the case that the 
nodes 5i £V have degree three or larger. This happens if a variable i appears in more than 
3 factor nodes on Qjr. It is easy to convert ^ to a graph were all Si nodes have maximum 
degree three by introducing new auxiliary variables 61^,61^,... and new equivalent nodes. 
For instance, if variable i G appears in 4 factors A, B, C, D: 

fSi{(^SiA, crSiB, (^SiC (^SiO) = fSi^ {<^SiA, crSiB, (^Si^ )fSi^ {(^Si-^ , (^SiC crSiO)- 

Notice that although the models are equivalent, the number of loops in Q may be larger 
than in Qj^. In the case that a factor in Q^r involves more than three variables, as sketched 

in Chcrtkov et al. (2008), one could split the node of degree N into auxiliary nodes of degree 
N —1 and compute ^0 on the transformed model. Alternatively, one can reduce the number 
of variables that enter a factor by clamping. 
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